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Abstract 

I make three conceptual points regarding multigrid methods for computing 
propagators in lattice gauge theory: 1) The class of operators handled by the 
algorithm must be stable under coarsening. 2) Problems related by symmetry 
should have solution methods related by symmetry. 3) It is crucial to distinguish 
the vector space V from its dual space V* . All the existing algorithms violate 
one or more of these principles. 



There has recently been much interest in developing multigrid methods for solving 
large linear systems with disordered coefficients |Lj, H> Hl> an d in particular for comput- 
ing the bosonic or fermionic propagator in a background gauge field. [] Many interest- 
ing ideas in this direction have been proposed by the Amsterdam [|], |, ||, [7j |8|, |9|, [LO 



Boston [|Tl], [lj§ 0, |TJ, 0, Hamburg |T|, |T7|, [19J, 0, 0, 0, @, |25f and Is- 



raeli |2^, |23, |28|, |29], [3(J, |31], |32|, [33|, |34], |35| groups. However, all these discussions 
have missed what I consider to be some key conceptual points. Perhaps a brief note 
explaining these points is therefore warranted. 

First point. We are interested in solving linear systems of the form Ax = b for 
some class of linear operators A. So the first order of business must be to specify 
clearly the class C of operators to which our algorithm is intended to apply. This is 
not a completely trivial matter, because 

1) For a multigrid (as opposed to two-grid) algorithm to be well-defined, the class 
C must be stable under coarsening.^ 

In other words, we start with some class Co of operators on the finest grid, for which 
we want to solve Ax = b. But then our multigrid algorithm produces operators on the 
first coarse grid, which belong to some quite possibly larger class C\ — what this class 
is depends on our procedure for choosing interpolation, restriction and coarse-grid 
operators. If we are then to continue to the second coarse grid, this procedure must 
be defined for all the operators in this larger class. If we want to allow an arbitrary 
number of grids, then we cannot stop short of a class C Z> Cq that is stable under 
coarsening. 

For example, our initial interest might be in nearest- neighbor Laplace operators 
in an SU(N) gauge field, with re-independent values for the hopping parameter (3 and 
the mass m: 

C = {A = -(3Au + m 2 : U xy G SU(N), (3 > 0, m 2 e R, A > 0} , (1) 

where 

!— 2d for x = y 
U xy for |x- 2/| = l (2) 
otherwise 

But quite possibly our coarsening procedure will produce a larger class of operators 
on the coarse grids: the mass may vary from site to site, or may even become non- 
trivial in color space; and the off-diagonal elements may have magnitudes that vary 



1 Typically the gauge field U is chosen stochastically from some probability distribution P(U), 
e.g. either that of quenched lattice gauge theory or else that of lattice gauge theory with dynamical 
fcrmions. 

2 I emphasize that the issue here is not whether the algorithm performs well or poorly; the issue 
is whether the algorithm is meaningful at all. 
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from link to link, or they may even leave the space M+ x SU(N) and enter its linear 
span (which for iV > 3 is the space of all N x N complex matrices) .0 If this occurs, 
we must then go back to the beginning, and define our coarsening procedure for all 
the operators in this larger class — and so on successively until the class stabilizes.^ 
This comment is relevant to the ground-state-projection (GSP) multigrid algo- 
rithms proposed by the Boston [11, 12, 13, 14, [15|] and Hamburg |17, 18, [II], [HJ 



2~T| , |2~4"| , p5| groups, in which the range of the interpolation operator is spanned by 
the lowest eigenvectors of the operator A modified to impose Neumann boundary 
conditions on 2 x 2 or 3 x 3 blocks. The trouble here is that the concept of "Neu- 
mann b.c", though well-defined for operators of the class ([]]), has no unambiguous 
meaning (as far as I can tell) when the off-diagonal elements of A are more general 
than (positive real number) x (unitary matrix). Possibly an appropriate definition 
can be found, but this is a highly nontrivial problem, particularly when the required 
covariances are taken into account (see the "second point" below). The situation is 
different for the Amsterdam version of GSP multigrid 0, ||, ||, ||, |9f (see also |T7|), 
which employs Dirichlet b.c. on 2 x 2 blocks. This operator is well-defined in general: 
one simply sets to zero those off-diagonal elements of A that cross block boundaries. 
On the other hand, it is unclear whether Dirichlet b.c. give a good interpolation: for 
example, in the case of the pure Laplacian (U = /), Dirichlet b.c. give a reasonable 
interpolation only on 2 x 2 blocks, where they are equivalent to Neumann b.c. plus a 
multiple of the identity matrix (and yield piecewise-constant interpolation); on larger 
blocks, Dirichlet b.c. yield sine-wave interpolation kernels, which do not perform well 
(because smooth functions, e.g. constants, cannot be interpolated accurately). This 
is not necessarily an argument against Dirichlet b.c. on 2 x 2 blocks, but it does raise 
doubts about whether the concept behind GSP using Dirichlet b.c. is correct. 

Second point. Suppose that there is a transformation T: C — > C that is "trivially 
calculable" , in the sense that from the solution of Ax = b we can trivially determine 
(in a negligible CPU time) the solution of T(A)x = b', and conversely. Then any 
sensible method (whether multigrid or anything else) for solving problems in C ought 
to be covariant under T, in the sense that using the method to solve Ax = b ought 
to give exactly the same approximate solutions, in exactly the same CPU time, as 
transforming to T(A)x = b', using the method to solve this latter equation, and then 
transforming back. Why? Well, suppose it weren't so. Then we would have two 
alternative methods for solving Ax = b and T(A)x = b', and presumably one of them 
would be better than the other; we should then use this method for both equations, 

3 This is a "dielectric gauge field" in the language of Mack J36| . 

4 This criterion is irrelevant in unigrid approaches (as proposed e.g. in p2[ ^3|), since here 
the concept of a coarse-grid operator never arises: everything is done on the original (fine) grid. 
However, these approaches have a severe drawback: the interpolation kernel must be computed on 
arbitrarily big blocks (e.g. 2 k or 3 fc for arbitrarily large k), and this can be prohibitive if the CPU 
time for this computation grows faster than linearly in the block volume. 
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and discard the other method. The principle here is quite general, and has nothing 
to do per se with linear equations: 



2) Problems related by symmetry should have solution methods related by sym- 
metry. (A special case is: Symmetric problems should have symmetric solution 
methods.) 

Of course, this principle is not always valid: there could arise, in principle, cases 
of "spontaneous symmetry breaking in algorithm space", in which each of a family 
of non-T-covariant algorithms would be better than the best T-covariant algorithm. 
But I find this possibility extremely unlikely in the present context. 

This principle has powerful consequences in combination with the first point. For 
example, the class C defined above is mapped into itself under SU (N) gauge trans- 
formations and under space-independent rescalings A —>■ XA; and many workers have 
recognized that a good algorithm ought to be covariant under these transformations. 
But the larger class of operators produced by coarsening may well be mapped into 
itself by space- dependent rescalings A xy — > \ x A xy X* or even by GL(N, C) gauge trans- 
formations A xy — > U x A xy U* with U x G GL(N,C). The multigrid algorithm ought 
then to be covariant under this larger group; but none of the published algorithms, 
to my knowledge, have this property^ 

Third point. My third point is more abstract, but it is related to the second one. 
Consider again Ax = b. On both physical and mathematical grounds, 

3) It is crucial to distinguish between the vector space V of solutions x and the 
vector space W of "forcing terms" b.f\ 

Thus, x may be a displacement while b is a force; x may be a magnetization while b 
is a magnetic field; and so forth. Most commonly in physical applications, we have 
W = V* = dual space of V: that is, there is a natural nondegenerate bilinear mapping 
V x W — > JR., which typically can be interpreted as an energy (force ■ displacement, 
magnetic field • magnetization, etc.)|] In this case the principle becomes 

5 The parallel-transported multigrid (PTMG) algorithm proposed by the Israeli group [^6[ |27], |2^, 
p9 . 30, 31. 32, 33[ |34|, [35| avoids this problem by forcing the coarse-grid operator to lie in the original 



class Co- The price paid is that the coarse-grid operator does not satisfy the Galerkin condition 
Ai = RAqP (see footnote [j^ below for discussion of why the Galerkin condition is desirable). The 
divergent iteration observed at small fermion mass [^7], ^8, 29 3(], |34|, [35| is probably related to this 
choice of a non-Galerkin coarse-grid operator. 

6 Two vector spaces of the same finite dimension are of course isomorphic, but they are not 
naturally isomorphic. Otherwise put, there are many linear bijections from V to W, and if we want 
to single out one of them we must find a physical reason for doing so. 

7 I use the notations £(x) = (£,x) = (x,£) to denote the natural bilinear mapping acting on 
I e V* and x € V. 
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3') It is crucial to distinguish between the vector space V and its dual space V*. 
(In a language perhaps more familiar to physicists: it is crucial to distinguish 
between contravariant and covariant tensors.) 

This principle is well known to differential geometers and general relativists, who 
have learned (since roughly the 1950's and 1960's, respectively) the advantages of 
a "coordinate-free" (basis-independent) notation. However, this principle is often 
forgotten by elementary-particle physicists, because we typically learned our linear 
algebra in the context of either quantum mechanics or matrix theory — each of which 
works not on a bare vector space V, but rather on a vector space V equipped with 
additional structure: 

(a) Quantum mechanics employs a Hilbert space, i.e. a vector space equipped with 
an inner product (positive-definite bilinear form on V). This inner product induces an 
identification map between V and V*. In quantum mechanics the inner product has, 
of course, a good physical interpretation (transition amplitude). In other contexts we 
should introduce such an identification operator only if it has likewise a good physical 
or mathematical raison d'etre. 

(b) Matrix theory is linear algebra referred to a particular basis — and this can be 
very misleading, unless the basis has been selected by some good physical or mathe- 
matical criterion. For example, if we have chosen particular bases for V and V* , then 
the matrix with entries 5ij defines a particular identification map from V to V*; but 
this identification is not natural, i.e. it is of no more physical or mathematical interest 
than any other linear bijection from V to V*. (In different bases this same identifica- 
tion would have different matrix representations, not equal to 8ij.) In particular, it 
is misleading to employ the term "identity matrix" in this context; this term ought 
to be reserved for the matrix representation of the identity operator, which maps a 
space V into itself. (This operator has the matrix representation 5ij with respect to 
any basis.) 

As soon as we distinguish between V and V* , we must distinguish two very dif- 
ferent types of linear operators: 

• Linear operators A: V — > V. 

• Linear operators Q: V — > V* [or what is equivalent, bilinear forms Q: V x V — > 




8 The correspondence is Q(x,y) — (Qx)(y) for i,t/£ V. 

9 In the complex case one has to consider sesgmlinear forms, and the role of V* is played by the 
antidual space (i.e. the space of all aniilinear maps from V to C). For simplicity we shall henceforth 
restrict attention to the real case. This is no loss of generality, because a complex vector space of 
dimension n can be considered as a real space of dimension 2n; and under this correspondence 
eigenvalues are preserved (more precisely, a + bi becomes the pair a±bi), hermiticity corresponds 
to symmetry, etc. 
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Only for operators from V to itself can we discuss eigenvalues, spectral radius, trace, 
determinant, etc.; only for operators from V to V* can we discuss symmetry (or 
hermiticity) and positive- (semi) defmitenessQQ 

Thus, in most physical contexts the Laplacian arises naturally as an operator 
from V to V*: the corresponding quadratic form is the energy functional. It is thus 
meaningless to discuss the "eigenvalues" of such an operator. (What is actually meant 
by such discussions will be explained below.) 

Let us examine the multigrid method in this context. We are given a fine-grid 
operator A : Vq —>Vq — usually symmetric and positive-definite — for which we want 
to solve Aqx = b. We then introduce a coarse-grid space V\, along with restriction, 
interpolation (prolongation) and coarse-grid operators. On what spaces do these 
operators act? Examination of the multigrid algorithm indicates that restriction 
is always applied to a residual (right-hand side); thus, R: Vq — > V*. Similarly, 
interpolation is always applied to an error (left-hand side); thus, P: V\ — »■ V . Finally, 
the coarse-grid operator acts similarly to the fine-grid operator, i.e. A\\ V\ — > V*. We 
thus have the following diagram: 

r (3) 

It follows that the oft-imposed condition R = P* is meaningful. Likewise, the 

10 Symmetry means that (Qx)(y) — {Qy){x), or equivalently that Q(x,y) — Q(y,x). Symmetry 
can also be written as Q — Q*, once we remember the definition of dual operator: if A: V — > W, 
then A*: W* — > V* is defined by (A*x)(y) = x(Ay) for x € W*, y € V. 

11 Positive-semidcfiniteness means that (Qx)(x) > for all x 6 V. Positive-definiteness means 
that (Qx)(x) > for all x € V, x ^ 0. 
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Galerkin condition A\ = RAqP is meanrng/u/.Q'^^ On the other hand, the con- 



dition RP = I Iflq , \17[ |19| , p2| , [23|| is meaningless. 

However, this is not the end of the story, because the multigrid algorithm contains 
an additional ingredient, namely a relaxation (smoothing) operator S : V x Vq* — > V 
which takes an approximate solution of Aqx = b and produces a new (and hopefully 
better) approximate solution. Usually this relaxation is a first-order stationary linear 
process 

<SoM) = S x + T b, (4) 

where S : Vq — > Vq, T : V * — > Vq and the consistency condition is T = (I — Sq)Aq 1 
[verify that this formula is meaningful!]. Such a process ordinarily arises from a 
splitting 

A = Bq + Cq, (5) 
where B : Vq — > V^* is some operator that is "easy to invert"; writing A x = b as 

12 Briggs ||^, pp. 67-69] and the Hamburg group |22|, |23| have given a very strong argument 
for the Galerkin choice of the coarse-grid operator. After fine-grid relaxation, we have an error 
eo = xq — Aq l b which satisfies the equation Aqc^ = ro, where ro = AqXq — b is the residual. Now, 
the multigrid philosophy is based on the idea that this error is smooth, in the sense that it is near to 
the range of the interpolation operator, i.e. eo « Pe\ for some vector ex € V\. This vector e x satisfies 
the equation RA Pei « r%, where ri = Rr is the coarse-grained residual. So this is the equation we 
should aim to solve on the coarse grid, i.e. we should take A\ = RAqP. (Note that this derivation 
does not need the condition RP = I ^2. 23 , which in fact is meaningless.) This derivation holds 



whether or not Aq is symmetric and/or positive-definite, and whether or not R = P* . 

A slight variant of this argument, using the = sign instead of ~, goes as follows: In the coarse- 
grid-correction phase of the multigrid algorithm, we replace xq by xq + Px\ for some x\ G V\. 
How should we choose x\l Ideally we would like the new vector xo + Px\ to have a zero residual, 
i.e. Aq{xq + Px\) — b = 0; but this is obviously too much to hope for in general, since diml/i < 
dimVb- The best we can hope for is to annihilate the smooth part of the residual, i.e. to have 
R[Aq(xq + Px\) — b] =0. (As before, we expect that the non-smooth part of the residual has 
already been made small by the fine-grid relaxation.) We should therefore aim to choose X\ so that 
RA Pxx = -r\. 

13 If the fine-grid operator Aq is symmetric (resp. symmetric and positive-semidefinite) and we 
make the variational choices R = P* and A\ — RAqP, then the coarse-grid operator A\ is likewise 
symmetric (resp. symmetric and positive-semidefinite) — a useful property. 

14 If the fine-grid operator Aq is symmetric and positive-definite, then we can give a strong 
argument §9], pp. 56-57] |o[ p. 2043] for the variational choices R = P* and Ax = RA P. Note 
first that solving A x — b is equivalent to minimizing the energy functional Hq(x) = A x) — 
{b,x). Now, in the coarse-grid-correction phase of the multigrid algorithm, we replace the current 
approximate solution Xq by a hopefully better approximate solution x + Px\, where X\ G V\. A 
sensible goal is to attempt to choose X\ so as to minimize Ho; this means that we attempt to minimize 
Hxixi) = H (x + Pxi) — \{x\, P*A Pxi) — (P*(b — A x ), x\) + const. This is precisely what the 
multigrid algorithm with R = P* and A\ = P*AqP does. (The only loophole in this logic is that 
one might entertain the possibility of "coarse-grid overcorrection" p5, [i2[ |l^, i.e. of choosing 
temporarily a sub-optimal approximate solution x\ in the hope that this will later bring benefits.) 
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Bqx = —Cqx + b suggests the iteration (Q) with 

S = -B^Co = I- B^Aq (6a) 
T = B, 1 (6b) 

[Conversely, any convergent first-order stationary linear iteration ([|) arises from a 
splitting B = T _1 = A (I — So) -1 .] A "good" choice of B (for a single-grid algo- 
rithm) is one that is "as close to Ao as possible" , subject to the condition that it be 
"easy to invert": this makes So as small as possible. 

Let us now examine the oft-made statement that "the slow modes of the (damped) 
Jacobi iteration correspond to the low eigenvalues of Ao" . This statement is in fact 
meaningless, because A does not map a space into itself, and thus does not have 
eigenvalues. But it is easy to see what the correct statement is: the slow modes of (f|) 
correspond to the eigenvalues near 1 of S , hence to the low eigenvalues of 5 ~ 1 A . 
Here Bq~ 1 Aq maps Vq into itself, and so does have eigenvalues. 

But we can also understand how the meaningless statement could have arisen. The 
(damped) Jacobi iteration is defined by letting B be (a multiple of) the diagonal part 
of Ao with respect to some particular basis. Now, for certain operators Aq in certain 
bases — for example, the Laplace operator —/3Ajj + m 2 in a site-wise basis — it 
happens that the diagonal part of A Q is given by a multiple of the identity matrix, 
so it is easy to think (incorrectly) that we are discussing the eigenvalues of A rather 
than those of B Q ~ 1 Ao. I emphasize that the performance of the Jacobi algorithm is 
a property not only of Ao but also of the choice of basis: for example, for the same 
Laplace operator A = — /3Au + m 2 with U = /, if we use a Fourier basis we have 
Bo = A , in which case the relaxation process has no slow modes. On the other hand, 
for more general operators A — such as Laplace operators with a site-dependent mass 
term — the diagonal part of Ao is not a multiple of the identity matrix, even in the 
site-wise basis; so in such a case the conventional statement is not merely misleading 
but is in fact incorrect. In all cases, the relevant quantities are the eigenvalues and 
eigenvectors of Bq Aq. 

Returning to the multigrid algorithm, we can summarize the situation as follows: 

i) The fine-grid problem is specified by an operator A : Vo — *■ V *, which is usually 
symmetric and positive-definite (as will be assumed henceforth). No other 
structure is present in the original problem. 

ii) Choosing a fine-grid relaxation method amounts to choosing a second operator 
Bo'. Vo —>■ Vq*. This operator is often (but not always) symmetric and positive- 
definite. 

15 The (damped) Jacobi choice of Bo is symmetric and positive-definite. The Gauss-Seidel or 
SOR choice of Bo is usually not symmetric, but it may be made symmetric (as well as positive- 
semidefinite) by explicit symmetrization, e.g. a forward sweep followed by a reverse sweep. Note also 
that Bq is symmetric if and only if So = I — B^ 1 Ao is Ao-symmetric (this means that AoSo = SqAq). 
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We must then choose the remaining elements of the multigrid algorithm: R, P and 
A\. Obviously these choices can and should depend on A and B ; but they should not 
depend on any additional structures unless we have a good physical or mathematical 
reason for introducing such structures into the problem. 



In footnotes |12|-|ll] above, I have given strong arguments for the variational choices 
R = P* and A\ = P*AqP. Henceforth these choices will be assumed. The only 
remaining choice is therefore that of the interpolation operator P. (This is, of course, 
the heart of the problem is disordered systems.) 

The goal in the choice of P is obviously to maximize the speed of convergence of 
the algorithm, measured in CPU-time units. Now, the asymptotic rate of convergence 
of any first-order stationary linear iteration is — log p(M), where p(M) is the spectral 
radius^ of the algorithm's iteration matrix M: V — * V. The iteration matrix of the 
multigrid algorithm is rather complicated — it depends on the number of levels, on 
the choice of V-cycle or W-cycle, etc. — so it is convenient to consider instead the 
iteration matrix of the idealized two-grid algorithm, i.e. the two-grid algorithm in 
which one assumes that the coarse-grid equation A\Xi = b\ is solved exactly. This is 
a useful warm-up problem prior to studying the complete multigrid algorithm, and 
there are both heuristic and rigorous connections between the two algorithms. Q Let 
us therefore concentrate henceforth on the iteration matrix of the idealized two-grid 
algorithm, for simplicity with only pre-smoothing: 

M = (I - PA^RAo)S . (7) 

It is worth noting that the operator PA^RAq. Vq — > Vq is the Ao-orthogonal projec- 
tion onto the range of -P.Q 

The condition for minimizing the spectral radius of Mo was given some years ago 
by Greenbaum [[EJ: If dimV^i = n 1; then p(M ) is minimized when Ran P consists of 

16 The spectral radius of an operator M: V — ► V is the largest absolute value of an eigenvalue. 

17 Obviously the good performance of the idealized two-grid algorithm is a necessary condition 
for the good performance of the multigrid algorithm. On the other hand, it is also a sufficient 
condition, at least for the W-cycle: if the energy norm of the idealized-two-grid iteration matrix is 
< (\/2/2) — e uniformly in the lattice size, then the energy norm of the multigrid iteration matrix 
is < 1 — (5 uniformly in the lattice size p3l Q. (The energy norm of an operator M : Vq — > Vb is 
||M|| = sup ((Mi, AqMx) / '{x, A x})^ 2 .) 

18 First proof (computational) : Clearly Ran PA^ 1 RAq C Ran P. To show that PA~^ 1 RAq acts as 
the identity on Ran P, let us apply it to a vector xq = Px\ where X\ € V\: we get PA^RAqPxi = 
Px\ since RAqP = A\. Finally, we note that PA^RAq is Ao-symmetric, i.e. Aq{PA^[ 1 RAq) = 
(PA^ 1 RA a )*A ; this is an immediate consequence of Aq = Aq, A\ = A\ and R = P*. 

Second proof (conceptual) : By construction, the coarse-grid-correction phase of the idealized two- 
grid algorithm with the variational choices R — P* and A\ = P*AqP replaces the vector xq by 
that vector xq + Pxi (x\ S V\) which minimizes the energy \({x$ + Pxi), Ao(xq + Pxi)). (Here 
we can assume without loss of generality a right-hand side 6 = 0.) This means that the operator 
/ — PA^RAq is the Ap-orthogonal projection onto the Ao-orthogonal complement of Ran P. 
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the ni eigenvectors of Bq 1 A with the smallest eigenvalues.^] Note that only the range 
of the interpolation is constrained here: we can compose P with an arbitrary bijective 
map in Vi, and as long as we use the variational choices R = P* and A\ = P*A P, 
this merely amounts to a redefinition of the coarse-grid fields, which has no effect on 
the sequence of approximants xq that the algorithm computes. 

To be sure, Greenbaum's condition is "impractical", for two reasons: Firstly, it 
is usually unfeasible to compute the eigenvectors of Bq 1 Aq (unless A and B are 
constant-coefficient difference operators), as this would take a CPU time of order 
(volume) 3 . Secondly, even if we could compute the eigenvectors of Bq X A , we would 
not want to use them as the columns of the interpolation matrix, because the matri- 
ces P, R and A\ would then be dense rather than sparse, and each iteration of the 
resulting multigrid algorithm would take a CPU time of order (volume) 2 . Neverthe- 
less, Greenbaum's condition is important because it indicates the ideal towards which 
we are striving, even though in practice this ideal will have to be traded off against 
considerations of computational complexity. 

It is in this context that we can consider the Hamburg group's recent proposal 



22| , |23| , which goes as follows: The "best" idealized two-grid algorithm is the one 
with the smallest spectral radius of M . But spectral radii are difficult to compute, 
so let us study instead some norm of M ; this changes the problem, but hopefully not 
by too much. Now, norms are often difficult to compute, too; but one norm which 
is easy to compute is the Frobenius (Hilbert-Schmidt) norm HM0II2 = tr MqMq. The 
first trouble — as the reader should by now be able to guess — is that this expression 
is meaningless: the range of Mq is in Vq, while the domain of M is Vq. However, we 
can fix it by defining instead^ 

HM0II2 = tiBoMoB^M* 

(= tr M Bq 1 M*B = ti B^M^BqMq = ix M^BqMqBq 1 ) . (8) 

This expression is meaningful without regard to any choice of basis, and it agrees 
with the preceding expression in any basis in which Bq is represented by a multiple 
of the identity matrix.^] 

But the trouble is really more fundamental: Why are we trying to minimize @, 
anyway? Minimizing (^j) with respect to P (with R and A\ fixed to the variational 
choices) is really no easier than minimizing the spectral radius (as far as I can tell); 



in 



Actually, Greenbaum's condition |45[ is slightly different, as she seeks to minimize the energy 



norm of Mq (see footnote 17 for its definition) rather than the spectral radius. The two criteria are 



equivalent whenever Bo is symmetric (i.e. Sq is ^-symmetric). 

20 A similar use of Bo — or more precisely, of the diagonal part of A$ — can be found in the 
norms defined by Ruge and Stiiben jf(| p. 78]. 

21 Such a basis can always be achieved by a GL(N, C) gauge transformation A xy — ► U x A xy U*. If 
the diagonal elements A xx are multiples of the identity in color space, it can be achieved simply by 
a space- dependent rescaling A xy — > X x A xy X*. 
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and even if it were, we wouldn't want to use the answer anyway, because it isn't 
sparse. Basically, the Hamburg condition — even if modified to map to the right 
spaces — suffers from the same defects as Greenbaum's original condition, without 
having its strongest virtues.0 

The real goal, I believe, is to minimize p(Mo) [or ||Mq||2 or something like it], 
subject to some constraint on the computational complexity of the resulting algorithm. 
There seem to be two main approaches: 

1) Geometric multigrid. Fix the sparsity pattern of P (and thus of A\), and 
minimize p(Mq) subject to this constraint. This amounts to fixing the spatial 
geometry of the coarse grids and the interpolation. 

2) Algebraic multigrid (AMG) ||46|| . Constrain roughly the number of nonzero 
entries in P and A\, but allow the algorithm to select the sparsity pattern of P 
(and thus of Ai) based on some analysis of the matrices A Q and B . 

The AMG approach is potentially more powerful, but also more complicated. In cases 
of very strong disorder, such as the random-resistor problem Jl|, [|, AMG is probably 
essential; but for gauge fields that are moderately smooth, the geometric approach 
may possibly work, and it makes sense to try it first. The problem of minimizing 
p(M ) [or some proxy for it] subject to a constrained sparsity pattern for P needs to 
be examined from first principles. 

Some readers may find the distinction between V and V* to be "pedantic" or 
"academic". I can say only that I have found this distinction to be an important 
clarifying principle — analogous to the distinction between meters and kilograms, or 
between contravariant and covariant tensors in general relativity^ — which helps in 
sorting out the kernel of meaning amid a mass of formulae.0 

22 In [^2[ the Hamburg group actually does something different from what I have just described: 
Firstly, instead of ||Mq||2 they consider || A^MqSq 1 }^- Secondly, they fix R (without justifying their 
choice by any optimality condition), and then minimize (^) with respect to P, imposing RP = / and 
the Galerkin condition A\ — RA P but obviously not the condition R = P* . I do not understand 
the logic behind either of these elements of their approach. 

23 For example, the 4- velocity of a fluid is naturally a vector field, while the electromagnetic vector 
potential is naturally a co vector field (= 1-form). In the presence of a metric tensor one can of course 
convert freely back and forth between vectors and covectors ("raising and lowering indices"); but 
the question is whether this operation is physically natural in the given context (that is, whether 
the physics of the problem at hand has anything to do with a spacetime metric). By insisting that 
raising-lowering operations be indicated explicitly, one facilitates such an inquiry. More generally, 
the goal is to highlight the physically relevant structures, and to eliminate the irrelevant ones. Does 
the problem at hand really need all the information in a metric? Can it perhaps be formulated 
on a differentiable manifold with no additional structure? Or on a manifold with only a conformal 
structure? Or on a manifold with only an affine connection? And so forth. 

24 In the multigrid literature, the distinction between V and V* can be found in [^0[ Sections II. B 
and II. C] and |^, Section 8]. Possibly there are also other references of which I am unaware. 



11 



Other readers may object to my advocacy of a basis-independent approach on the 
grounds that "the computer must of necessity work in a particular basis" . This is of 
course true, just as it is true that the article you are now reading must of necessity 
be written in a particular natural language (in this case English). Nevertheless, the 
content of this article should remain the same if, for example, it is translated to Serbo- 
Croatian; likewise, the content of the multigrid computation should remain the same 
if it is translated to another basis. The basis has no more physical relevance than does 
the choice of (x, y, z) axes in mechanics. In reality, the basis has only two relevant 
effects: on roundoff error, and on computational complexity . The former effect is quite 
easily handled by making rescalings (or GL(N, C) gauge transformations) to prevent 
the matrix elements from getting too big or too small; this is a mere computational 
detail, which does not alter the underlying algorithm. The computational complexity 
is, on the other hand, a serious issue: as mentioned above, it must be handled by 
imposing some constraints on the sparsity pattern of P. 

Fermions. The reader is probably wondering how fermions fit into the foregoing 
framework. The answer is that I am not completely sure, but here is my best attempt: 

The underlying structure of a fermionic model seems to be a pair of vector spaces 
V, V equipped with a distinguished bijective map E: V — > V (here E maps ipi onto 
its corresponding field ^i).0 The fermionic operator A = Ip + m then maps V —>■ V* 
(think of the bilinear form TpiAijifrj) . 

There are two main approaches to multigrid for fermions: 



1) Work directly with the fermionic operator A [11], [14, 26, 27|, |8, 2g [30, 3J, || 



2) Work with the "bosonic" operator A* A or A A* §, Q, §, |, [ITJ [TJ, gg, ^ 

My personal preference is for the second approach: it leads us back to the familiar 
terrain of a symmetric positive-definite operator, for which we can confidently apply 
Gauss-Seidel smoothing, the variational choices R = P* and A\ = P*A P, and so 
forth; the sole question is the usual one, namely the choice of the interpolation P. 

The only trouble with the operators A* A and AA* is that, according to my frame- 
work, they are meaningless] If A maps V — > V*, then A* maps V — > V*, so the 
combinations A* A and A A* are meaningless. Worse yet, there is no way of making 
a meaningful "squared" map V — > V* or V — > V* (remember that we want it to be 
positive-semidefinite, so it has to map a space into its dual) out of the ingredients 
A, A*, E, E*, E~ x and (E^ 1 )*] Rather, it seems that one may consider, with equal 
justice, any of the operators A*KA or ALA*, where K: V* — > V and L: V* — > V are 
symmetric and positive-definite but otherwise arbitrary. 



25 Equivalently, one can consider the vector space V © V equipped with a distinguished involution 
J: V © V -> V © V, where J =[ E Q 
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Luckily, this formalism need be applied only to the original (fine-grid) fermion 
operator, which is a Wilson or staggered fermion operator in a unitary gauge field and 
with a space-independent color-independent mass term. (The coarse-grid operators, 
which may well involve "dielectric" gauge fields, are all bosonic.) Therefore, it may be 
possible to justify the standard choice of K and L — namely, the operator specified by 
the identity matrix in the site- wise basis — as being in some precise sense "natural" . 
I leave this as a open problem. 

Some concluding remarks. The foregoing comments should not be interpreted as 
a dismissal of the existing literature on multigrid methods for propagators. In fact, 
I believe that the two major existing approaches — parallel-transported multigrid 
H, |7], H H, 10, H 11 H and ground-state-projection multigrid [|, |, |, g 
0, [13], [b§ [T§ [17|, |18l [Tg |20], [2T[ [24], [2§ — both contain valuable physical insights 
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that are likely to play a role in successful future algorithms. However, I believe that 
these methods must be rethought and reworked to bring them into conformity with 
the three basic conceptual principles expounded here. 

Some readers will undoubtedly feel that these conceptual principles — especially 
the distinction between V and V* — are mere "mathematical nit-picking", and are 
irrelevant to the practical task of designing an efficient algorithm. I take the opposite 
point of view: it seems to me that at the current stage of development, we have 
already too many proposals for multigrid algorithms and their ingredients, not too 
few; what we need now is to improve our understanding of what we are doing, so as 
to lay a solid foundation for future success in practice. Only time will tell. 



I wish to thank Radi Ben-Av, Rich Brower, Robert Edwards, Jonathan Goodman, 
Thomas Kalkreuter, Gerhard Mack, Sorin Solomon and Marcus Speh for numerous 
discussions over the past few years. This research was supported in part by National 
Science Foundation grant DMS-9200719. 
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